Tamil Nadu
7dfcaf4512bbf2a807a783b90afb6c09-Paper-Datasets_and_Benchmarks_Track.pdf
Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations collected in low-quality environments to generate high-quality TTS training data. Our pipeline leverages the cross-lingual generalization of denoising and speech enhancement models trained on English and applied to Indian languages. This results in IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset derived from an ASR dataset, with 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages.
7dfcaf4512bbf2a807a783b90afb6c09-Paper-Datasets_and_Benchmarks_Track.pdf
Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations collected in low-quality environments to generate high-quality TTS training data. Our pipeline leverages the cross-lingual generalization of denoising and speech enhancement models trained on English and applied to Indian languages. This results in IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset derived from an ASR dataset, with 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages.
From Linear to Spline-Based Classification:Developing and Enhancing SMPA for Noisy Non-Linear Datasets
Building upon the concepts and mechanisms used for the development in Moving Points Algorithm, we will now explore how non linear decision boundaries can be developed for classification tasks. First we will look at the classification performance of MPA and some minor developments in the original algorithm. We then discuss the concepts behind using cubic splines for classification with a similar learning mechanism and finally analyze training results on synthetic datasets with known properties.
VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection
Chavan, Kunal, Balaji, Keertan, Barigidad, Spoorti, Chiluveru, Samba Raju
With an increasing demand for assistive technologies that promote the independence and mobility of visually impaired people, this study suggests an innovative real-time system that gives audio descriptions of a user's surroundings to improve situational awareness. The system acquires live video input and processes it with a quantized and fine-tuned Florence-2 big model, adjusted to 4-bit accuracy for efficient operation on low-power edge devices such as the NVIDIA Jetson Orin Nano. By transforming the video signal into frames with a 5-frame latency, the model provides rapid and contextually pertinent descriptions of objects, pedestrians, and barriers, together with their estimated distances. The system employs Parler TTS Mini, a lightweight and adaptable Text-to-Speech (TTS) solution, for efficient audio feedback. It accommodates 34 distinct speaker types and enables customization of speech tone, pace, and style to suit user requirements. This study examines the quantization and fine-tuning techniques utilized to modify the Florence-2 model for this application, illustrating how the integration of a compact model architecture with a versatile TTS component improves real-time performance and user experience. The proposed system is assessed based on its accuracy, efficiency, and usefulness, providing a viable option to aid vision-impaired users in navigating their surroundings securely and successfully.
On the Development of Binary Classification Algorithm Based on Principles of Geometry and Statistical Inference
The aim of this paper is to investigate an attempt to build a binary classification algorithm using principles of geometry such as vectors, planes, and vector algebra. The basic idea behind the proposed algorithm is that a hyperplane can be used to completely separate a given set of data points mapped to n dimensional space, if the given data points are linearly separable in the n dimensions. Since points are the foundational elements of any geometrical construct, by manipulating the position of points used for the construction of a given hyperplane, the position of the hyperplane itself can be manipulated. The paper includes testing data against other classifiers on a variety of standard machine learning datasets. With a focus on support vector machines, since they and our proposed classifier use the same geometrical construct of hyperplane, and the versatility of SVMs make them a good bench mark for comparison. Since the algorithm focuses on moving the points through the hyperspace to which the dataset has been mapped, it has been dubbed as moving points algorithm.
A Comprehensive Insights into Drones: History, Classification, Architecture, Navigation, Applications, Challenges, and Future Trends
Singh, Ruchita, Kumar, Sandeep
Unmanned Aerial Vehicles (UAVs), commonly known as Drones, are one of 21st century most transformative technologies. Emerging first for military use, advancements in materials, electronics, and software have catapulted drones into multipurpose tools for a wide range of industries. In this paper, we have covered the history, taxonomy, architecture, navigation systems and branched activities for the same. It explores important future trends like autonomous navigation, AI integration, and obstacle avoidance systems, emphasizing how they contribute to improving the efficiency and versatility of drones. It also looks at the major challenges like technical, environmental, economic, regulatory and ethical, that limit the actual take-up of drones, as well as trends that are likely to mitigate these obstacles in the future. This work offers a structured synthesis of existing studies and perspectives that enable insights about how drones will transform agriculture, logistics, healthcare, disaster management, and other areas, while also identifying new opportunities for innovation and development.
RNN-Based Models for Predicting Seizure Onset in Epileptic Patients
Mounagurusamy, Mathan Kumar, S, Thiyagarajan V, Rahman, Abdur, Chandak, Shravan, Balaji, D., Jallepalli, Venkateswara Rao
Early management and better clinical outcomes for epileptic patients depend on seizure prediction. The accuracy and false alarm rates of existing systems are often compromised by their dependence on static thresholds and basic Electroencephalogram (EEG) properties. A novel Recurrent Neural Network (RNN)-based method for seizure start prediction is proposed in the article to overcome these limitations. As opposed to conventional techniques, the proposed system makes use of Long Short-Term Memory (LSTM) networks to extract temporal correlations from unprocessed EEG data. It enables the system to adapt dynamically to the unique EEG patterns of each patient, improving prediction accuracy. The methodology of the system comprises thorough data collecting, preprocessing, and LSTM-based feature extraction. Annotated EEG datasets are then used for model training and validation. Results show a considerable reduction in false alarm rates (average of 6.8%) and an improvement in prediction accuracy (90.2% sensitivity, 88.9% specificity, and AUC-ROC of 93). Additionally, computational efficiency is significantly higher than that of existing systems (12 ms processing time, 45 MB memory consumption). About improving seizure prediction reliability, these results demonstrate the effectiveness of the proposed RNN-based strategy, opening up possibilities for its practical application to improve epilepsy treatment.
Upstream flow geometries can be uniquely learnt from single-point turbulence signatures
Karunanethy, Mukesh, Rengaswamy, Raghunathan, Panchagnula, Mahesh V
We test the hypothesis that the microscopic temporal structure of near-field turbulence downstream of a sudden contraction contains geometry-identifiable information pertaining to the shape of the upstream obstruction. We measure a set of spatially sparse velocity time-series data downstream of differently-shaped orifices. We then train random forest multiclass classifier models on a vector of invariants derived from this time-series. We test the above hypothesis with 25 somewhat similar orifice shapes to push the model to its extreme limits. Remarkably, the algorithm was able to identify the orifice shape with 100% accuracy and 100% precision. This outcome is enabled by the uniqueness in the downstream temporal evolution of turbulence structures in the flow past orifices, combined with the random forests' ability to learn subtle yet discerning features in the turbulence microstructure. We are also able to explain the underlying flow physics that enables such classification by listing the invariant measures in the order of increasing information entropy. We show that the temporal autocorrelation coefficients of the time-series are most sensitive to orifice shape and are therefore informative. The ability to identify changes in system geometry without the need for physical disassembly offers tremendous potential for flow control and system identification. Furthermore, the proposed approach could potentially have significant applications in other unrelated fields as well, by deploying the core methodology of training random forest classifiers on vectors of invariant measures obtained from time-series data.
Integrative CAM: Adaptive Layer Fusion for Comprehensive Interpretation of CNNs
Singh, Aniket K., Chaudhuri, Debasis, Singh, Manish P., Chattopadhyay, Samiran
With the growing demand for interpretable deep learning models, this paper introduces Integrative CAM, an advanced Class Activation Mapping (CAM) technique aimed at providing a holistic view of feature importance across Convolutional Neural Networks (CNNs). Traditional gradient-based CAM methods, such as Grad-CAM and Grad-CAM++, primarily use final layer activations to highlight regions of interest, often neglecting critical features derived from intermediate layers. Integrative CAM addresses this limitation by fusing insights across all network layers, leveraging both gradient and activation scores to adaptively weight layer contributions, thus yielding a comprehensive interpretation of the model's internal representation. Our approach includes a novel bias term in the saliency map calculation, a factor frequently omitted in existing CAM techniques, but essential for capturing a more complete feature importance landscape, as modern CNNs rely on both weighted activations and biases to make predictions. Additionally, we generalize the alpha term from Grad-CAM++ to apply to any smooth function, expanding CAM applicability across a wider range of models. Through extensive experiments on diverse and complex datasets, Integrative CAM demonstrates superior fidelity in feature importance mapping, effectively enhancing interpretability for intricate fusion scenarios and complex decision-making tasks. By advancing interpretability methods to capture multi-layered model insights, Integrative CAM provides a valuable tool for fusion-driven applications, promoting the trustworthy and insightful deployment of deep learning models.
An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance Images
V, Pandiyaraju, Venkatraman, Shravan, A, Abeshek, A, Aravintakshan S, S, Pavan Kumar, S, Madhan
Tumors in the brain result from abnormal cell growth within the brain tissue, arising from various types of brain cells. When left undiagnosed, they lead to severe neurological deficits such as cognitive impairment, motor dysfunction, and sensory loss. As the tumor grows, it causes an increase in intracranial pressure, potentially leading to life-threatening complications such as brain herniation. Therefore, early detection and treatment are necessary to manage the complications caused by such tumors to slow down their growth. Numerous works involving deep learning (DL) and artificial intelligence (AI) are being carried out to assist physicians in early diagnosis by utilizing the scans obtained through Magnetic Resonance Imaging (MRI). Our research proposes DL frameworks for localizing, segmenting, and classifying the grade of these gliomas from MRI images to solve this critical issue. In our localization framework, we enhance the LinkNet framework with a VGG19- inspired encoder architecture for improved multimodal tumor feature extraction, along with spatial and graph attention mechanisms to refine feature focus and inter-feature relationships. Following this, we integrated the SeResNet101 CNN model as the encoder backbone into the LinkNet framework for tumor segmentation, which achieved an IoU Score of 96%. To classify the segmented tumors, we combined the SeResNet152 feature extractor with an Adaptive Boosting classifier, which yielded an accuracy of 98.53%. Our proposed models demonstrated promising results, with the potential to advance medical AI by enabling early diagnosis and providing more accurate treatment options for patients.